Web text mining deals with structure and unstructured data of the web to discover knowledge by use of data mining, machine learning, natural language processing, information retrieval, knowledge management. Web文本挖掘采用数据挖掘、机器学习、自然语言处理、信息检索和知识管理等领域的技术来处理和分析非结构或半结构化的文本,从中提取有价值的知识。
Although some of the information is stored in a structured form, a great deal of information is unstructured and written in natural language. 尽管有一些信息是结构化存储的,但是大量的有价值的信息还是以非结构化自然语言的形式存在的。
Therefore, we can know that the research object is unstructured tree text and the study goal is to extract knowledge, involving natural language processing, text mining and other related fields. 明确了研究对象是非结构化的自由文本,研究目标是从非结构化的自由文本中抽取知识,涉及了自然语言处理和文本挖掘等领域的相关技术。
The extraction of necessary information and knowledge from large-scale and unstructured text has become a research focus as well as a challenge in natural language processing. 如何迅速、有效地从这些海量的、非结构化的文本中获取我们所需要的信息和知识已经成为自然语言处理领域的一个研究热点。
The main purpose of information extraction is to transform unstructured natural language text into semi-structured or structured data, easy for people to obtain key information quickly and accurately. 信息抽取的主要目的是将非结构化的自然语言文本转化成半结构化或结构化的数据,方便人们准确、快速地获取关键信息。
As complete and detailed clinical information resources of being produced and recorded during patients 'treatment, structured electronic medical record also contains a large number of unstructured text information, such as medical records of clinical manifestations recorded by natural language. 作为病人在医疗机构历次就诊过程中产生和被记录的完整、详细的临床信息资源,结构化的电子病历中还包含有大量的非结构化文本信息,例如以自然语言记录的临床表现等医疗记录。